Perf: Optimize pages loading (Filecache path like approach)#2549
Conversation
e644e8e to
5d10baa
Compare
Signed-off-by: Kostiantyn Miakshyn <molodchick@gmail.com>
5d10baa to
5e6b2e9
Compare
mejo-
left a comment
There was a problem hiding this comment.
Thanks a lot @Koc, this looks really promising 🤩
I have some comments, but I'm genuinely curious what you think about the comments.
If you do further changes to the PR, could you do them in separate fixup commits (and don't force-push changes to the existing commit for now) so it's easier to review your changes?
| * @throws NotFoundException | ||
| * @throws NotPermittedException | ||
| */ | ||
| public function getPagesFromFolderV2(int $collectiveId, Folder $folder, string $userId, bool $recurse = false, bool $forceIndex = false): array { |
There was a problem hiding this comment.
Is there a reason to introduce this as a new function instead of replacing getPagesFromFolder() altogether? There's still one call left to the old function (lib/Service/PageService.php:984), was this left out on purpose?
I'd prefer to replace the old function and update the Unit tests along, as it would be great to not loose the existing unit testing for getPagesFromFolder() for the new function.
There was a problem hiding this comment.
I would say that this is temporary POC method to have possibility quickly switch between old an new implementation. So it's easier to add V2 prefix to call.
But, anyway, I guess that old implementation should be preserved for background jobs/console commands to re-create tree.
| * `subfolder/page.md`), so it matches the semantics of `File::getInternalPath()` | ||
| * within the jailed collective storage. | ||
| */ | ||
| class FileInfo { |
There was a problem hiding this comment.
Maybe better to call this CollectiveFileInfo to not confuse it with the public FileInfo API from server.
| $qb->select('fileid', 'storage', 'path', 'parent', 'name', 'mimetype', 'mimepart', | ||
| 'size', 'mtime', 'storage_mtime', 'encrypted', 'etag', 'permissions') | ||
| ->from('filecache') | ||
| ->where($qb->expr()->eq('storage', $qb->createNamedParameter($storageId, IQueryBuilder::PARAM_INT))) |
There was a problem hiding this comment.
Any reason to not further filter on mimetype here and only get folders and Markdown files? If I understand the code further down correctly, we only process folders and Markdown files anyway, right?
There was a problem hiding this comment.
My goal was to introduce a new index that will cover columns used in filter. So, in this case we should extend it with mime type then
| $pageFileIds[] = $fileInfo->fileId; | ||
| } | ||
| } | ||
| $pagesByFileId = $this->pageMapper->findByFileIds($pageFileIds); |
There was a problem hiding this comment.
I wonder whether this could become expensive if a few hundred fileIds get passed here. Not sure how well DB backends work with a large list of parameters to IN(). Did you test it on a collective with 500+ pages?
There was a problem hiding this comment.
Well, here the limit of max 999 (SQLite) or 1000 (Oracle) hits. So we'd have to chunk these queries. Probably best done inside PageMapper::findByFileIds and PageLinkMapper::findByPageIds so callers don't need to know. But can be done in a follow-up later on.
There was a problem hiding this comment.
Yeah, it should be fine from performance perspective. I will add workaround for 1000+ items (this is a known issue from other modules 😄 )
| } | ||
|
|
||
| // Create missing index page if folder or subfolders have page files (or forceIndex) | ||
| $subFolder = $collectiveFolder->getFirstNodeById($folderId); |
There was a problem hiding this comment.
I found the variable name subFolder missleading, as it's the folder object of the current folder. Why not call it folder instead?
| * @throws NotFoundException | ||
| * @throws \OCP\DB\Exception | ||
| */ | ||
| public function getFileCacheForCollective(int $collectiveId): array { |
There was a problem hiding this comment.
Wouldn't it be an option to pass the folder name here as well and further narrow down the query to only contain files in this subfolder? This would make the query less heavy when getPagesFromFolderV2() is called with a subfolder and not for the whole collective folder.
There was a problem hiding this comment.
This would be particularly helpful for building the templateFolder page tree, as that one doesn't need to process all the files from the collective rood folder.
| $this->setTimestamp($fileInfo->mtime); | ||
| $this->setSize($fileInfo->size); | ||
| $this->setFileName($fileInfo->name); | ||
| if ($collectivePath !== null) { |
There was a problem hiding this comment.
I guess these checks can be omitted as the default is null anyway, so setting them to null again doesn't hurt.
| /** | ||
| * Build the page info from a lightweight filecache entry (see PageService::getPagesFromFolderV2). | ||
| */ | ||
| public function fromFileInfo( |
There was a problem hiding this comment.
This copies much of the logic of fromFile(). Maybe worth refactoring to have less code duplication?
📝 Summary
This is alternative approach to #2390 that fixes same performance issue (closes #2380).
Benefits comparing to previous implementation:
So, we're just load all necessary pages via simple query
SELECT * FROM filecache WHERE storage_id = <storageId> AND path LIKE 'appdata_<instanceId>/collectives/<collectiveId>/%'🖼️ Screenshots
Collective with 390 pages with various nesting level
🚧 TODO
filecachetable tostorage_id, pathcolumns (but this requires extra PR to nextcloud/server)🏁 Checklist
npm run lint/npm run stylelint/composer run cs:check)🤖 AI (if applicable)